The Goal

The entire goal of my project was to predict when an NBA player will have his peak season.

I did this entirely based on PER, or Player Efficiency Rating. My initial plan was to come up with a model or new way to calulate something very similar to this, but I didn’t know exactly how complicated and accurate PER was. Here is a picture of the formula analysts use to find PER. Pretty dang complicated right?

Tidying the Data

The data was already organized fairly well but I did have to select what data I wanted after I finished tidying.

I filtered the data for multiple reasons.

First I only wanted data from seasons 1997-2023 because before 1997, the data wasn’t complete, and I didn’t include the 2024 season because it is the season currently going on right now, and does not have complete data either.

I filtered to minutes played greater than 1500 a season to filter out bench players, which would skew our PER predictions later. This is players playing an average of 18-24 minutes a game. This is to give a buffer for injuries to starters as well.

Another filter was that a player needed to play more than 30 games for similar reasons as the minutes.

I also had to filter on position played, because multiple players had multiple positions played.

Interesting analyses

Almost all variables I tested with PER ended up being strongly correlated, however here are some results I found that are pretty interesting. This one compares PER on the percentage of attempts a player shoots from the three point line. Surprisingly the more three point shots that centers take the greater their PER whereas other positions PER decreases dramatically.

This graph is cool because we see that over time, Centers and Point Guards have gotten “better” or have a larger average PER. Power Forwards, Small Forwards, and Shooting Guards all have gotten “worse” or have a lower average PER.

Modeling

I made several models and as you can see from the comparison graph below, my model 4 seems to fit my data the best to predict PER.

The PER formula for m4 is experience + g + mp + ts_percent + x3p_ar + f_tr + orb_percent + drb_percent + trb_percent + ast_percent + stl_percent + blk_percent+ tov_percent + usg_percent + ows + dws+ ws+ ws_48 + obpm + dbpm + bpm + vorp

8 Year Sample

Here is a graph that is the PER for players that have 8 years experience. The graph shows their PER for each season they have played up to their 8th, and a trend line for each player.

This is just a sample of what the data looks like.

Modeling 2

I experimented with using the library caret, which stands for Classification and REgression Training, to increase my knowledge in model training. I set a random seed to split my data into training and testing sets, set up a linear model with the formula from model 4, and predicted the peak season for each NBA player.

Difference Graphs

This first graph is a really good representation of how accurate my predictions are compared to the last.

This graph is a really nice and easy to read indicator on how far off my prediction was, and how often.

I also learned about the plotly package, which is primarily used for interactive data visualization.

Results

Here are our final results laid out in number and graph format.
column n mean sd median trimmed mad min max range skew kurtosis se
player_id 6 2986.166667 16.773988 2988 2986.166667 14.5 2967 3004 37 -0.0952759 1.198705 6.847952
max_PER_experience 6 5.333333 2.875181 5 5.333333 0.5 1 10 9 0.1884518 2.904787 1.173788
first_experience 6 1.000000 0.000000 1 1.000000 0.0 1 1 0 NaN NaN 0.000000
experience_difference 6 4.333333 2.875181 4 4.333333 0.5 0 9 9 0.1884518 2.904787 1.173788